Data Wrangling: Querying and Merging EPA Clean Air Markets Program Data (CAMPD)

Pa-Shun Hawkins & Robert J. Dellinger

Summer Research Project 2024

Data Wrangling: Querying and Merging EPA Clean Air Markets Program Data (CAMPD)

Data source & methods

This report explains, step by step, how we gathered, cleaned, combined, and graphed power-plant emission data from EPA’s Clean Air Markets Program Data (CAMPD). First, we downloaded the EPA’s “Facility” files, which tell us where each plant is, what it burns, what pollution controls it uses, and which regulatory programs it’s in. Next, we pulled the yearly emissions totals for sulfur dioxide (), nitrogen oxides (), and carbon dioxide ( from 1995 to 2024. We then matched those emissions back to the facility information so every record knows who emitted, where, and when. After checking that locations and county identifiers stayed intact, we rolled the data up to bigger geographies (facility, county, state) so we could map and compare trends. We saved every intermediate file and the final, cleaned datasets so anyone can rerun the work and get the same results.

Setup Packages

Set API Key and Facility Query

# Set your API key here (https://www.epa.gov/power-sector/cam-api-portal)
api_key <- Sys.getenv("EPA_API_KEY")

# API base URLs
apiUrlBase <- "https://api.epa.gov/easey"
bucketUrlBase <- "https://api.epa.gov/easey/bulk-files/"
selected_states <- c("IL", "IN", "KY", "OH", "PA", "TN", "WV")  # Example states

# Bulk files endpoint (e.g., CAMD Services)
servicesUrl <- paste0(apiUrlBase,
                      "/camd-services/bulk-files?API_KEY=", api_key)

Query EPA CAMPD Facility Data

# GET request to the bulk endpoint
res <- GET(servicesUrl)

# Handle error responses
if (res$status_code > 399) {
  errorFrame <- fromJSON(rawToChar(res$content))
  stop(paste("Error Code:", errorFrame$error$code,
             errorFrame$error$message))
}

# Convert the JSON response into a data frame
bulkFiles <- fromJSON(rawToChar(res$content))

# Confirm available file types
print(unique(bulkFiles$metadata$dataType))
## [1] "EDR"                                    
## [2] "Allowance"                              
## [3] "Mercury and Air Toxics Emissions (MATS)"
## [4] "Compliance"                             
## [5] "Facility"                               
## [6] "XML"                                    
## [7] "Emissions"
# Filter for Facility files (corrected label)
facilityFiles <- bulkFiles[bulkFiles$metadata$dataType == "Facility", ]

# Summary
message("Number of facility files to download: ", nrow(facilityFiles))
## Number of facility files to download: 34
message("Total size of facility files: ", sum(facilityFiles$megaBytes,
                                              na.rm = TRUE), " MB")
## Total size of facility files: 46.7 MB
# Set download directory
downloadDir <- out_dir_facilities

# Initialize a data table
facilityDataBulk <- data.table()

# Loop through each new file
if (nrow(facilityFiles) > 0) {
  for (i in 1:nrow(facilityFiles)) {
    s3Path <- facilityFiles[i, "s3Path"]
    filename <- facilityFiles[i, "filename"]
    fileURL <- paste0(bucketUrlBase, s3Path)
    
    message("Downloading file: ", filename, " from URL: ", fileURL)
    
    # Optionally: save to disk
    # GET(fileURL, write_disk(file.path(downloadDir, filename), overwrite = TRUE))

    # Read data directly into R and clean it
    tempData <- fread(fileURL, showProgress = TRUE)
    tempData <- clean_names(tempData)
    
    facilityDataBulk <- rbind(facilityDataBulk, tempData, fill = TRUE)
  }
  
  # Sort by year if column exists
  if ("year" %in% names(facilityDataBulk)) {
    facilityDataBulk <- facilityDataBulk %>% arrange(year)
  }
}
## Downloading file: facility-2007.csv from URL: https://api.epa.gov/easey/bulk-files/facility/facility-2007.csv
## Downloading file: facility-2011.csv from URL: https://api.epa.gov/easey/bulk-files/facility/facility-2011.csv
## Downloading file: facility-1985.csv from URL: https://api.epa.gov/easey/bulk-files/facility/facility-1985.csv
## Downloading file: facility-2006.csv from URL: https://api.epa.gov/easey/bulk-files/facility/facility-2006.csv
## Downloading file: facility-2001.csv from URL: https://api.epa.gov/easey/bulk-files/facility/facility-2001.csv
## Downloading file: facility-2018.csv from URL: https://api.epa.gov/easey/bulk-files/facility/facility-2018.csv
## Downloading file: facility-2004.csv from URL: https://api.epa.gov/easey/bulk-files/facility/facility-2004.csv
## Downloading file: facility-2000.csv from URL: https://api.epa.gov/easey/bulk-files/facility/facility-2000.csv
## Downloading file: facility-2002.csv from URL: https://api.epa.gov/easey/bulk-files/facility/facility-2002.csv
## Downloading file: facility-2003.csv from URL: https://api.epa.gov/easey/bulk-files/facility/facility-2003.csv
## Downloading file: facility-1995.csv from URL: https://api.epa.gov/easey/bulk-files/facility/facility-1995.csv
## Downloading file: facility-2023.csv from URL: https://api.epa.gov/easey/bulk-files/facility/facility-2023.csv
## Downloading file: facility-1998.csv from URL: https://api.epa.gov/easey/bulk-files/facility/facility-1998.csv
## Downloading file: facility-1990.csv from URL: https://api.epa.gov/easey/bulk-files/facility/facility-1990.csv
## Downloading file: facility-2014.csv from URL: https://api.epa.gov/easey/bulk-files/facility/facility-2014.csv
## Downloading file: facility-2010.csv from URL: https://api.epa.gov/easey/bulk-files/facility/facility-2010.csv
## Downloading file: facility-1997.csv from URL: https://api.epa.gov/easey/bulk-files/facility/facility-1997.csv
## Downloading file: facility-2013.csv from URL: https://api.epa.gov/easey/bulk-files/facility/facility-2013.csv
## Downloading file: facility-2008.csv from URL: https://api.epa.gov/easey/bulk-files/facility/facility-2008.csv
## Downloading file: facility-2015.csv from URL: https://api.epa.gov/easey/bulk-files/facility/facility-2015.csv
## Downloading file: facility-2021.csv from URL: https://api.epa.gov/easey/bulk-files/facility/facility-2021.csv
## Downloading file: facility-2012.csv from URL: https://api.epa.gov/easey/bulk-files/facility/facility-2012.csv
## Downloading file: facility-2017.csv from URL: https://api.epa.gov/easey/bulk-files/facility/facility-2017.csv
## Downloading file: facility-1999.csv from URL: https://api.epa.gov/easey/bulk-files/facility/facility-1999.csv
## Downloading file: facility-2009.csv from URL: https://api.epa.gov/easey/bulk-files/facility/facility-2009.csv
## Downloading file: facility-2025.csv from URL: https://api.epa.gov/easey/bulk-files/facility/facility-2025.csv
## Downloading file: facility-2016.csv from URL: https://api.epa.gov/easey/bulk-files/facility/facility-2016.csv
## Downloading file: facility-1980.csv from URL: https://api.epa.gov/easey/bulk-files/facility/facility-1980.csv
## Downloading file: facility-2005.csv from URL: https://api.epa.gov/easey/bulk-files/facility/facility-2005.csv
## Downloading file: facility-1996.csv from URL: https://api.epa.gov/easey/bulk-files/facility/facility-1996.csv
## Downloading file: facility-2022.csv from URL: https://api.epa.gov/easey/bulk-files/facility/facility-2022.csv
## Downloading file: facility-2019.csv from URL: https://api.epa.gov/easey/bulk-files/facility/facility-2019.csv
## Downloading file: facility-2020.csv from URL: https://api.epa.gov/easey/bulk-files/facility/facility-2020.csv
## Downloading file: facility-2024.csv from URL: https://api.epa.gov/easey/bulk-files/facility/facility-2024.csv
facility_data_cleaned <- facilityDataBulk %>% 
  filter(state %in% selected_states) %>% 
  clean_names()

print(facility_data_cleaned)
##         state                 facility_name facility_id unit_id
##        <char>                        <char>       <int>  <char>
##     1:     OH                 Refuse & Coal         312     001
##     2:     OH                 Refuse & Coal         312     002
##     3:     OH                 Refuse & Coal         312     003
##     4:     OH                 Refuse & Coal         312     004
##     5:     OH                 Refuse & Coal         312     005
##    ---                                                         
## 31792:     OH              Pratt Paper (OH)      880109    B001
## 31793:     TN Holston Army Ammunition Plant      880110       1
## 31794:     TN Holston Army Ammunition Plant      880110       2
## 31795:     TN Holston Army Ammunition Plant      880110       3
## 31796:     TN Holston Army Ammunition Plant      880110       4
##        associated_stacks  year program_code primary_rep_info epa_region
##                   <char> <int>       <char>           <char>      <int>
##     1:                    1980          ARP             <NA>          5
##     2:                    1980          ARP             <NA>          5
##     3:                    1980          ARP             <NA>          5
##     4:                    1980          ARP             <NA>          5
##     5:                    1980          ARP             <NA>          5
##    ---                                                                 
## 31792:                    2025       SIPNOX           610587          5
## 31793:                    2025       SIPNOX           610811          4
## 31794:                    2025       SIPNOX           610811          4
## 31795:                    2025       SIPNOX           610811          4
## 31796:                    2025       SIPNOX           610811          4
##        nerc_region          county county_code fips_code
##             <char>          <char>      <char>     <int>
##     1:             Franklin County       OH049        49
##     2:             Franklin County       OH049        49
##     3:             Franklin County       OH049        49
##     4:             Franklin County       OH049        49
##     5:             Franklin County       OH049        49
##    ---                                                  
## 31792:             Auglaize County       OH011        11
## 31793:              Hawkins County       TN073        73
## 31794:              Hawkins County       TN073        73
## 31795:              Hawkins County       TN073        73
## 31796:              Hawkins County       TN073        73
##                  source_category latitude longitude
##                           <char>    <num>     <num>
##     1: Municipal Waste Combustor    39.91    -83.01
##     2: Municipal Waste Combustor    39.91    -83.01
##     3: Municipal Waste Combustor    39.91    -83.01
##     4: Municipal Waste Combustor    39.91    -83.01
##     5: Municipal Waste Combustor    39.91    -83.01
##    ---                                             
## 31792:         Industrial Boiler    40.54    -84.19
## 31793:         Industrial Boiler    36.55    -82.63
## 31794:         Industrial Boiler    36.55    -82.63
## 31795:         Industrial Boiler    36.55    -82.63
## 31796:         Industrial Boiler    36.55    -82.63
##                       owner_operator so2_phase n_ox_phase    unit_type
##                               <char>    <char>     <char>       <char>
##     1:                          <NA>   Phase 2       <NA>             
##     2:                          <NA>   Phase 2       <NA>             
##     3:                          <NA>   Phase 2       <NA>             
##     4:                          <NA>   Phase 2       <NA>             
##     5:                          <NA>   Phase 2       <NA>             
##    ---                                                                
## 31792: Pratt Paper (OH), LLC (Owner)                      Other boiler
## 31793:             U.S. Army (Owner)                      Other boiler
## 31794:             U.S. Army (Owner)                      Other boiler
## 31795:             U.S. Army (Owner)                      Other boiler
## 31796:             U.S. Army (Owner)                      Other boiler
##           primary_fuel_type secondary_fuel_type so2_controls
##                      <char>              <char>       <char>
##     1:                                                      
##     2:                                                      
##     3:                                                      
##     4:                                                      
##     5:                                                      
##    ---                                                      
## 31792: Pipeline Natural Gas                                 
## 31793: Pipeline Natural Gas          Diesel Oil             
## 31794: Pipeline Natural Gas          Diesel Oil             
## 31795: Pipeline Natural Gas          Diesel Oil             
## 31796: Pipeline Natural Gas          Diesel Oil             
##                                                                             n_ox_controls
##                                                                                    <char>
##     1:                                                                                   
##     2:                                                                                   
##     3:                                                                                   
##     4:                                                                                   
##     5:                                                                                   
##    ---                                                                                   
## 31792: Combustion Modification/Fuel Reburning|Low NOx Burner Technology (Dry Bottom only)
## 31793:          Low NOx Burner Technology (Dry Bottom only)|Selective Catalytic Reduction
## 31794:          Low NOx Burner Technology (Dry Bottom only)|Selective Catalytic Reduction
## 31795:          Low NOx Burner Technology (Dry Bottom only)|Selective Catalytic Reduction
## 31796:          Low NOx Burner Technology (Dry Bottom only)|Selective Catalytic Reduction
##        pm_controls hg_controls commercial_operation_date operating_status
##             <char>      <char>                    <IDat>           <char>
##     1:                    <NA>                1981-12-31                 
##     2:                    <NA>                1981-12-31                 
##     3:                    <NA>                1981-12-31                 
##     4:                    <NA>                1981-12-31                 
##     5:                    <NA>                1981-12-31                 
##    ---                                                                   
## 31792:                                        2019-09-25        Operating
## 31793:     Wet ESP                            2021-12-18        Operating
## 31794:     Wet ESP                            2021-12-21        Operating
## 31795:     Wet ESP                            2021-09-20        Operating
## 31796:     Wet ESP                            2021-09-20        Operating
##        max_hourly_hi_rate_mm_btu_hr
##                               <num>
##     1:                           NA
##     2:                           NA
##     3:                           NA
##     4:                           NA
##     5:                           NA
##    ---                             
## 31792:                        332.3
## 31793:                        327.0
## 31794:                        327.0
## 31795:                        327.0
## 31796:                        327.0
##        associated_generators_nameplate_capacity_m_we
##                                               <char>
##     1:                        1 (30), 2 (30), 3 (30)
##     2:                        2 (30), 1 (30), 3 (30)
##     3:                        1 (30), 3 (30), 2 (30)
##     4:                        3 (30), 1 (30), 2 (30)
##     5:                        1 (30), 3 (30), 2 (30)
##    ---                                              
## 31792:                                              
## 31793:                                              
## 31794:                                              
## 31795:                                              
## 31796:
# Save
readr::write_csv(facility_data_cleaned, here(out_dir_facilities, "EPA_CAMPD_facilities_data.csv"))

Mapping EPA CAMPD Facility Data

Filter out any records missing latitude or longitude, then create an interactive leaflet map.

facility_data_loaded <- read_csv(here(out_dir_facilities, "EPA_CAMPD_facilities_data.csv"))

# Convert commercial_operation_date to Date type if needed
facility_data <- facility_data_loaded %>% 
  mutate(commercial_operation_date = as.Date(commercial_operation_date))

# Aggregate data by facility (unique facility_id & facility_name)
facility_summary <- facility_data %>%
  group_by(state, facility_id, facility_name) %>%
  summarise(
    n_units = n_distinct(unit_id), 
    first_operation = min(commercial_operation_date, na.rm = TRUE),
    year_range = paste0(min(year, na.rm = TRUE), "-",
                        max(year, na.rm = TRUE)),
    latitude = first(latitude),
    longitude = first(longitude),
    program_code = first(program_code),
    operating_status = first(operating_status),
    .groups = "drop"
  )
## Warning: There were 22 warnings in `summarise()`.
## The first warning was:
## ℹ In argument: `first_operation = min(commercial_operation_date, na.rm =
##   TRUE)`.
## ℹ In group 65: `state = "IL"`, `facility_id = 55823`, `facility_name =
##   "Indeck-Elwood Energy Center"`.
## Caused by warning in `min.default()`:
## ! no non-missing arguments to min; returning Inf
## ℹ Run `dplyr::last_dplyr_warnings()` to see the 21 remaining warnings.
# Inspect summary
print(facility_summary)
## # A tibble: 419 × 10
##    state facility_id facility_name   n_units first_operation year_range latitude
##    <chr>       <dbl> <chr>             <int> <date>          <chr>         <dbl>
##  1 IL            384 Joliet 29             4 1965-04-09      1980-2023      41.5
##  2 IL            856 E D Edwards           3 1960-05-01      1980-2022      40.6
##  3 IL            859 R S Wallace           2 1952-07-01      1980-2006      40.5
##  4 IL            861 Coffeen               2 1965-12-20      1980-2019      39.1
##  5 IL            862 Grand Tower En…       5 1951-03-01      1980-2024      37.7
##  6 IL            863 Hutsonville           2 1953-02-01      1980-2013      39.1
##  7 IL            864 Meredosia             6 1948-06-01      1980-2015      39.8
##  8 IL            867 Crawford              2 1958-05-23      1980-2013      41.8
##  9 IL            874 Joliet 9              1 1959-06-12      1980-2023      41.5
## 10 IL            876 Kincaid Genera…       2 1967-06-07      1980-2025      39.6
## # ℹ 409 more rows
## # ℹ 3 more variables: longitude <dbl>, program_code <chr>,
## #   operating_status <chr>
# Filter out facilities with missing coordinates
facility_map_data <- facility_summary %>% 
  filter(!is.na(latitude) & !is.na(longitude))

# Define custom awesome markers
customIcon <- awesomeIcons(
  icon = "industry",      # facilities
  iconColor = "white",
  markerColor = "steelblue",   # Blue markers
  library = "fa"
)

leaflet(data = facility_map_data) %>%
  addProviderTiles("CartoDB.DarkMatter") %>%  # Modern dark basemap
  addAwesomeMarkers(
    lng = ~longitude, 
    lat = ~latitude,
    icon = customIcon,
    popup = ~paste0("<strong>", facility_name, "</strong><br>",
                     "Facility ID: ", facility_id, "<br>",
                     "State: ", state, "<br>",
                     "First Operation: ", first_operation, "<br>",
                     "Year Range: ", year_range, "<br>",
                     "Program Code: ", program_code, "<br>",
                     "Units: ", n_units),
    clusterOptions = markerClusterOptions()  # Enable clustering to improve readability
  ) %>%
  addScaleBar(position = "bottomleft") %>%  # Add a scale bar
  addEasyButton(easyButton(
    icon = "fa-globe", title = "Reset View",
    onClick = JS("function(btn, map){ map.setView([39.8283, -98.5795], 4); }")
  ))

Set API Key and Emissions Query

For each state, we asked the EPA API for all years at once (1995–2024). Because the API only returns a limited number of rows per page, we read how many total rows were available, paged through everything, and stitched the pages together. We saved every state’s results to disk, then read them back in and converted the columns to the correct data types in one go so everything stays consistent.

# Set your API key
# api_key <- "YOUR_API_KEY_HERE"

# API base URLs
apiUrlBase <- "https://api.epa.gov/easey"
# Annual emissions endpoint URL
annualEmissionsPageUrl <- paste0(apiUrlBase, "/emissions-mgmt/emissions/apportioned/annual?API_KEY=", api_key)

# Selected States
selected_states <- c("PA", "OH", "IN", "IL", "KY", "TN", "WV") 

# Years
years <- 1995:2024

Querying EPA CAMPD Emissions Data

This document outlines our method for querying annual emissions data from the EPA CAMD API for each state over the period 1995–2024. We construct a query using pipe-separated values for years and state codes, handle pagination to retrieve all records, and save each state’s aggregated data as a CSV file.Downloading annual emissions data for each state from 1995 to 2024 from the EPA CAMD API. The data includes information on the Acid Rain Program, Cross-State Air Pollution Rule, and Clean Air Interstate Rule. The data is saved in CSV format for each state and data between 1995 and 2024 is included. The code handles pagination to ensure all data is retrieved, and it prints the total number of records for each state. The API key must be set before running the code.

# Helper to read x-total-count robustly
get_total_count <- function(res) {
  h <- names(res$headers)
  i <- which(tolower(h) == "x-total-count")
  if (!length(i)) return(NA_real_)
  suppressWarnings(as.numeric(res$headers[[i[1]]]))
}

# Small safe JSON parser
safe_fromJSON <- function(raw_txt) {
  tryCatch(jsonlite::fromJSON(raw_txt), error = function(e) NULL)
}

# Loop through each selected state
for (st in selected_states) {
  message("Processing state: ", st)

  # Build query parameters with pipe-separated years
  query <- list(
    year      = paste(years, collapse = '|'),
    stateCode = st,
    page      = 1,
    perPage   = 100 # 500 is the maximum per page
  )

  # Make an initial GET request to determine the total number of records
  res <- GET(annualEmissionsPageUrl, query = query)
  if (res$status_code > 399) {
    errorFrame <- safe_fromJSON(rawToChar(res$content))
    stop(paste(
      "HTTP", res$status_code, "->",
      if (!is.null(errorFrame) && !is.null(errorFrame$error$message))
        errorFrame$error$message
      else rawToChar(res$content)
    ))
  }

  totalRowsAvailableForQuery <- get_total_count(res)
  if (is.na(totalRowsAvailableForQuery)) totalRowsAvailableForQuery <- 0

  message("Total records for ", st, ": ", totalRowsAvailableForQuery)

  if (totalRowsAvailableForQuery == 0) {
    message("No rows returned for ", st, ". Skipping…")
    next
  }

  # Calculate total pages required
  totalPages <- ceiling(totalRowsAvailableForQuery / query$perPage)
  message("Total pages to fetch: ", totalPages)

  # Pre-allocate list
  allData <- vector("list", totalPages)
  page <- 1

  # Loop through pages until all pages are fetched
  while (page <= totalPages) {
    query$page <- page
    res <- GET(annualEmissionsPageUrl, query = query)

    if (res$status_code > 399) {
      errorFrame <- safe_fromJSON(rawToChar(res$content))
      stop(paste(
        "HTTP", res$status_code, "->",
        if (!is.null(errorFrame) && !is.null(errorFrame$error$message))
          errorFrame$error$message
        else rawToChar(res$content)
      ))
    }

    dataPage <- safe_fromJSON(rawToChar(res$content))

    if (is.null(dataPage) || (is.data.frame(dataPage) && nrow(dataPage) == 0)) {
      message("Empty page (", page, "). Breaking.")
      break
    }

    allData[[page]] <- tibble::as_tibble(dataPage)
    message("Fetched page ", page, " / ", totalPages)
    page <- page + 1

    Sys.sleep(0.25)  # be nice to the API
  }

  # Combine all pages into a single data frame
  combinedAnnualEmissData <- dplyr::bind_rows(allData) %>%
    clean_names()
  print(combinedAnnualEmissData, n = 3)

  # Write the data to a CSV file for the current state
  outputFile <- here::here(
    "Data", "EPA_CAMPD_Facility_Summary_Data", "Emissions_Data",
    paste0("EPA_CAMPD_Annual_Emissions_", st, "_", min(years),
           "_", max(years), ".csv")
  )
  dir.create(dirname(outputFile), recursive = TRUE, showWarnings = FALSE)
  readr::write_csv(combinedAnnualEmissData, outputFile)
  message("Saved file: ", outputFile)
}
## Processing state: PA
## Total records for PA: 5484
## Total pages to fetch: 55
## Fetched page 1 / 55
## Fetched page 2 / 55
## Fetched page 3 / 55
## Fetched page 4 / 55
## Fetched page 5 / 55
## Fetched page 6 / 55
## Fetched page 7 / 55
## Fetched page 8 / 55
## Fetched page 9 / 55
## Fetched page 10 / 55
## Fetched page 11 / 55
## Fetched page 12 / 55
## Fetched page 13 / 55
## Fetched page 14 / 55
## Fetched page 15 / 55
## Fetched page 16 / 55
## Fetched page 17 / 55
## Fetched page 18 / 55
## Fetched page 19 / 55
## Fetched page 20 / 55
## Fetched page 21 / 55
## Fetched page 22 / 55
## Fetched page 23 / 55
## Fetched page 24 / 55
## Fetched page 25 / 55
## Fetched page 26 / 55
## Fetched page 27 / 55
## Fetched page 28 / 55
## Fetched page 29 / 55
## Fetched page 30 / 55
## Fetched page 31 / 55
## Fetched page 32 / 55
## Fetched page 33 / 55
## Fetched page 34 / 55
## Fetched page 35 / 55
## Fetched page 36 / 55
## Fetched page 37 / 55
## Fetched page 38 / 55
## Fetched page 39 / 55
## Fetched page 40 / 55
## Fetched page 41 / 55
## Fetched page 42 / 55
## Fetched page 43 / 55
## Fetched page 44 / 55
## Fetched page 45 / 55
## Fetched page 46 / 55
## Fetched page 47 / 55
## Fetched page 48 / 55
## Fetched page 49 / 55
## Fetched page 50 / 55
## Fetched page 51 / 55
## Fetched page 52 / 55
## Fetched page 53 / 55
## Fetched page 54 / 55
## Fetched page 55 / 55
## # A tibble: 5,484 × 25
##   state_code facility_name           facility_id unit_id associated_stacks  year
##   <chr>      <chr>                         <int> <chr>   <chr>             <int>
## 1 PA         Brunot Island Power St…        3096 1A      <NA>               2002
## 2 PA         Brunot Island Power St…        3096 1B      <NA>               2002
## 3 PA         Brunot Island Power St…        3096 1C      <NA>               2002
## # ℹ 5,481 more rows
## # ℹ 19 more variables: count_op_time <int>, sum_op_time <dbl>,
## #   gross_load <dbl>, steam_load <dbl>, so2mass <dbl>, so2rate <dbl>,
## #   co2mass <dbl>, co2rate <dbl>, nox_mass <dbl>, nox_rate <dbl>,
## #   heat_input <dbl>, primary_fuel_info <chr>, secondary_fuel_info <chr>,
## #   unit_type <chr>, so2control_info <chr>, nox_control_info <chr>,
## #   pm_control_info <chr>, hg_control_info <chr>, program_code_info <chr>
## Saved file: /Users/robertjdellinger/Documents/Github/critical-ecology-hubbardbrook-project/Data/EPA_CAMPD_Facility_Summary_Data/Emissions_Data/EPA_CAMPD_Annual_Emissions_PA_1995_2024.csv
## Processing state: OH
## Total records for OH: 4608
## Total pages to fetch: 47
## Fetched page 1 / 47
## Fetched page 2 / 47
## Fetched page 3 / 47
## Fetched page 4 / 47
## Fetched page 5 / 47
## Fetched page 6 / 47
## Fetched page 7 / 47
## Fetched page 8 / 47
## Fetched page 9 / 47
## Fetched page 10 / 47
## Fetched page 11 / 47
## Fetched page 12 / 47
## Fetched page 13 / 47
## Fetched page 14 / 47
## Fetched page 15 / 47
## Fetched page 16 / 47
## Fetched page 17 / 47
## Fetched page 18 / 47
## Fetched page 19 / 47
## Fetched page 20 / 47
## Fetched page 21 / 47
## Fetched page 22 / 47
## Fetched page 23 / 47
## Fetched page 24 / 47
## Fetched page 25 / 47
## Fetched page 26 / 47
## Fetched page 27 / 47
## Fetched page 28 / 47
## Fetched page 29 / 47
## Fetched page 30 / 47
## Fetched page 31 / 47
## Fetched page 32 / 47
## Fetched page 33 / 47
## Fetched page 34 / 47
## Fetched page 35 / 47
## Fetched page 36 / 47
## Fetched page 37 / 47
## Fetched page 38 / 47
## Fetched page 39 / 47
## Fetched page 40 / 47
## Fetched page 41 / 47
## Fetched page 42 / 47
## Fetched page 43 / 47
## Fetched page 44 / 47
## Fetched page 45 / 47
## Fetched page 46 / 47
## Fetched page 47 / 47
## # A tibble: 4,608 × 25
##   state_code facility_name facility_id unit_id associated_stacks  year
##   <chr>      <chr>               <int> <chr>   <chr>             <int>
## 1 OH         Refuse & Coal         312 001     <NA>               1997
## 2 OH         Refuse & Coal         312 001     <NA>               1998
## 3 OH         Refuse & Coal         312 001     <NA>               1999
## # ℹ 4,605 more rows
## # ℹ 19 more variables: count_op_time <int>, sum_op_time <dbl>,
## #   gross_load <dbl>, steam_load <dbl>, so2mass <dbl>, so2rate <dbl>,
## #   co2mass <dbl>, co2rate <dbl>, nox_mass <dbl>, nox_rate <dbl>,
## #   heat_input <dbl>, primary_fuel_info <chr>, secondary_fuel_info <chr>,
## #   unit_type <chr>, so2control_info <chr>, nox_control_info <chr>,
## #   pm_control_info <chr>, hg_control_info <chr>, program_code_info <chr>
## Saved file: /Users/robertjdellinger/Documents/Github/critical-ecology-hubbardbrook-project/Data/EPA_CAMPD_Facility_Summary_Data/Emissions_Data/EPA_CAMPD_Annual_Emissions_OH_1995_2024.csv
## Processing state: IN
## Total records for IN: 4394
## Total pages to fetch: 44
## Fetched page 1 / 44
## Fetched page 2 / 44
## Fetched page 3 / 44
## Fetched page 4 / 44
## Fetched page 5 / 44
## Fetched page 6 / 44
## Fetched page 7 / 44
## Fetched page 8 / 44
## Fetched page 9 / 44
## Fetched page 10 / 44
## Fetched page 11 / 44
## Fetched page 12 / 44
## Fetched page 13 / 44
## Fetched page 14 / 44
## Fetched page 15 / 44
## Fetched page 16 / 44
## Fetched page 17 / 44
## Fetched page 18 / 44
## Fetched page 19 / 44
## Fetched page 20 / 44
## Fetched page 21 / 44
## Fetched page 22 / 44
## Fetched page 23 / 44
## Fetched page 24 / 44
## Fetched page 25 / 44
## Fetched page 26 / 44
## Fetched page 27 / 44
## Fetched page 28 / 44
## Fetched page 29 / 44
## Fetched page 30 / 44
## Fetched page 31 / 44
## Fetched page 32 / 44
## Fetched page 33 / 44
## Fetched page 34 / 44
## Fetched page 35 / 44
## Fetched page 36 / 44
## Fetched page 37 / 44
## Fetched page 38 / 44
## Fetched page 39 / 44
## Fetched page 40 / 44
## Fetched page 41 / 44
## Fetched page 42 / 44
## Fetched page 43 / 44
## Fetched page 44 / 44
## # A tibble: 4,394 × 25
##   state_code facility_name           facility_id unit_id associated_stacks  year
##   <chr>      <chr>                         <int> <chr>   <chr>             <int>
## 1 IN         State Line Generating …         981 3       <NA>               1995
## 2 IN         State Line Generating …         981 3       <NA>               1996
## 3 IN         State Line Generating …         981 3       <NA>               1997
## # ℹ 4,391 more rows
## # ℹ 19 more variables: count_op_time <int>, sum_op_time <dbl>,
## #   gross_load <dbl>, steam_load <dbl>, so2mass <dbl>, so2rate <dbl>,
## #   co2mass <dbl>, co2rate <dbl>, nox_mass <dbl>, nox_rate <dbl>,
## #   heat_input <dbl>, primary_fuel_info <chr>, secondary_fuel_info <chr>,
## #   unit_type <chr>, so2control_info <chr>, nox_control_info <chr>,
## #   pm_control_info <chr>, hg_control_info <chr>, program_code_info <chr>
## Saved file: /Users/robertjdellinger/Documents/Github/critical-ecology-hubbardbrook-project/Data/EPA_CAMPD_Facility_Summary_Data/Emissions_Data/EPA_CAMPD_Annual_Emissions_IN_1995_2024.csv
## Processing state: IL
## Total records for IL: 6517
## Total pages to fetch: 66
## Fetched page 1 / 66
## Fetched page 2 / 66
## Fetched page 3 / 66
## Fetched page 4 / 66
## Fetched page 5 / 66
## Fetched page 6 / 66
## Fetched page 7 / 66
## Fetched page 8 / 66
## Fetched page 9 / 66
## Fetched page 10 / 66
## Fetched page 11 / 66
## Fetched page 12 / 66
## Fetched page 13 / 66
## Fetched page 14 / 66
## Fetched page 15 / 66
## Fetched page 16 / 66
## Fetched page 17 / 66
## Fetched page 18 / 66
## Fetched page 19 / 66
## Fetched page 20 / 66
## Fetched page 21 / 66
## Fetched page 22 / 66
## Fetched page 23 / 66
## Fetched page 24 / 66
## Fetched page 25 / 66
## Fetched page 26 / 66
## Fetched page 27 / 66
## Fetched page 28 / 66
## Fetched page 29 / 66
## Fetched page 30 / 66
## Fetched page 31 / 66
## Fetched page 32 / 66
## Fetched page 33 / 66
## Fetched page 34 / 66
## Fetched page 35 / 66
## Fetched page 36 / 66
## Fetched page 37 / 66
## Fetched page 38 / 66
## Fetched page 39 / 66
## Fetched page 40 / 66
## Fetched page 41 / 66
## Fetched page 42 / 66
## Fetched page 43 / 66
## Fetched page 44 / 66
## Fetched page 45 / 66
## Fetched page 46 / 66
## Fetched page 47 / 66
## Fetched page 48 / 66
## Fetched page 49 / 66
## Fetched page 50 / 66
## Fetched page 51 / 66
## Fetched page 52 / 66
## Fetched page 53 / 66
## Fetched page 54 / 66
## Fetched page 55 / 66
## Fetched page 56 / 66
## Fetched page 57 / 66
## Fetched page 58 / 66
## Fetched page 59 / 66
## Fetched page 60 / 66
## Fetched page 61 / 66
## Fetched page 62 / 66
## Fetched page 63 / 66
## Fetched page 64 / 66
## Fetched page 65 / 66
## Fetched page 66 / 66
## # A tibble: 6,517 × 25
##   state_code facility_name facility_id unit_id associated_stacks  year
##   <chr>      <chr>               <int> <chr>   <chr>             <int>
## 1 IL         Joliet 29             384 71      CS7172             1995
## 2 IL         Joliet 29             384 71      CS7172             1996
## 3 IL         Joliet 29             384 71      CS7172             1997
## # ℹ 6,514 more rows
## # ℹ 19 more variables: count_op_time <int>, sum_op_time <dbl>,
## #   gross_load <dbl>, steam_load <dbl>, so2mass <dbl>, so2rate <dbl>,
## #   co2mass <dbl>, co2rate <dbl>, nox_mass <dbl>, nox_rate <dbl>,
## #   heat_input <dbl>, primary_fuel_info <chr>, secondary_fuel_info <chr>,
## #   unit_type <chr>, so2control_info <chr>, nox_control_info <chr>,
## #   pm_control_info <chr>, hg_control_info <chr>, program_code_info <chr>
## Saved file: /Users/robertjdellinger/Documents/Github/critical-ecology-hubbardbrook-project/Data/EPA_CAMPD_Facility_Summary_Data/Emissions_Data/EPA_CAMPD_Annual_Emissions_IL_1995_2024.csv
## Processing state: KY
## Total records for KY: 2773
## Total pages to fetch: 28
## Fetched page 1 / 28
## Fetched page 2 / 28
## Fetched page 3 / 28
## Fetched page 4 / 28
## Fetched page 5 / 28
## Fetched page 6 / 28
## Fetched page 7 / 28
## Fetched page 8 / 28
## Fetched page 9 / 28
## Fetched page 10 / 28
## Fetched page 11 / 28
## Fetched page 12 / 28
## Fetched page 13 / 28
## Fetched page 14 / 28
## Fetched page 15 / 28
## Fetched page 16 / 28
## Fetched page 17 / 28
## Fetched page 18 / 28
## Fetched page 19 / 28
## Fetched page 20 / 28
## Fetched page 21 / 28
## Fetched page 22 / 28
## Fetched page 23 / 28
## Fetched page 24 / 28
## Fetched page 25 / 28
## Fetched page 26 / 28
## Fetched page 27 / 28
## Fetched page 28 / 28
## # A tibble: 2,773 × 25
##   state_code facility_name           facility_id unit_id associated_stacks  year
##   <chr>      <chr>                         <int> <chr>   <chr>             <int>
## 1 KY         Smith Generating Facil…          54 SCT1    <NA>               2000
## 2 KY         Smith Generating Facil…          54 SCT1    <NA>               2001
## 3 KY         Smith Generating Facil…          54 SCT1    <NA>               2002
## # ℹ 2,770 more rows
## # ℹ 19 more variables: count_op_time <int>, sum_op_time <dbl>,
## #   gross_load <dbl>, steam_load <dbl>, so2mass <dbl>, so2rate <dbl>,
## #   co2mass <dbl>, co2rate <dbl>, nox_mass <dbl>, nox_rate <dbl>,
## #   heat_input <dbl>, primary_fuel_info <chr>, secondary_fuel_info <chr>,
## #   unit_type <chr>, so2control_info <chr>, nox_control_info <chr>,
## #   pm_control_info <chr>, hg_control_info <chr>, program_code_info <chr>
## Saved file: /Users/robertjdellinger/Documents/Github/critical-ecology-hubbardbrook-project/Data/EPA_CAMPD_Facility_Summary_Data/Emissions_Data/EPA_CAMPD_Annual_Emissions_KY_1995_2024.csv
## Processing state: TN
## Total records for TN: 2607
## Total pages to fetch: 27
## Fetched page 1 / 27
## Fetched page 2 / 27
## Fetched page 3 / 27
## Fetched page 4 / 27
## Fetched page 5 / 27
## Fetched page 6 / 27
## Fetched page 7 / 27
## Fetched page 8 / 27
## Fetched page 9 / 27
## Fetched page 10 / 27
## Fetched page 11 / 27
## Fetched page 12 / 27
## Fetched page 13 / 27
## Fetched page 14 / 27
## Fetched page 15 / 27
## Fetched page 16 / 27
## Fetched page 17 / 27
## Fetched page 18 / 27
## Fetched page 19 / 27
## Fetched page 20 / 27
## Fetched page 21 / 27
## Fetched page 22 / 27
## Fetched page 23 / 27
## Fetched page 24 / 27
## Fetched page 25 / 27
## Fetched page 26 / 27
## Fetched page 27 / 27
## # A tibble: 2,607 × 25
##   state_code facility_name facility_id unit_id associated_stacks  year
##   <chr>      <chr>               <int> <chr>   <chr>             <int>
## 1 TN         Allen                3393 1       <NA>               1995
## 2 TN         Allen                3393 1       <NA>               1996
## 3 TN         Allen                3393 1       <NA>               1997
## # ℹ 2,604 more rows
## # ℹ 19 more variables: count_op_time <int>, sum_op_time <dbl>,
## #   gross_load <dbl>, steam_load <dbl>, so2mass <dbl>, so2rate <dbl>,
## #   co2mass <dbl>, co2rate <dbl>, nox_mass <dbl>, nox_rate <dbl>,
## #   heat_input <dbl>, primary_fuel_info <chr>, secondary_fuel_info <chr>,
## #   unit_type <chr>, so2control_info <chr>, nox_control_info <chr>,
## #   pm_control_info <chr>, hg_control_info <lgl>, program_code_info <chr>
## Saved file: /Users/robertjdellinger/Documents/Github/critical-ecology-hubbardbrook-project/Data/EPA_CAMPD_Facility_Summary_Data/Emissions_Data/EPA_CAMPD_Annual_Emissions_TN_1995_2024.csv
## Processing state: WV
## Total records for WV: 1798
## Total pages to fetch: 18
## Fetched page 1 / 18
## Fetched page 2 / 18
## Fetched page 3 / 18
## Fetched page 4 / 18
## Fetched page 5 / 18
## Fetched page 6 / 18
## Fetched page 7 / 18
## Fetched page 8 / 18
## Fetched page 9 / 18
## Fetched page 10 / 18
## Fetched page 11 / 18
## Fetched page 12 / 18
## Fetched page 13 / 18
## Fetched page 14 / 18
## Fetched page 15 / 18
## Fetched page 16 / 18
## Fetched page 17 / 18
## Fetched page 18 / 18
## # A tibble: 1,798 × 25
##   state_code facility_name facility_id unit_id associated_stacks  year
##   <chr>      <chr>               <int> <chr>   <chr>             <int>
## 1 WV         John E Amos          3935 1       CS012              1995
## 2 WV         John E Amos          3935 1       CS012              1996
## 3 WV         John E Amos          3935 1       CS012              1997
## # ℹ 1,795 more rows
## # ℹ 19 more variables: count_op_time <int>, sum_op_time <dbl>,
## #   gross_load <dbl>, steam_load <dbl>, so2mass <dbl>, so2rate <dbl>,
## #   co2mass <dbl>, co2rate <dbl>, nox_mass <dbl>, nox_rate <dbl>,
## #   heat_input <dbl>, primary_fuel_info <chr>, secondary_fuel_info <chr>,
## #   unit_type <chr>, so2control_info <chr>, nox_control_info <chr>,
## #   pm_control_info <chr>, hg_control_info <chr>, program_code_info <chr>
## Saved file: /Users/robertjdellinger/Documents/Github/critical-ecology-hubbardbrook-project/Data/EPA_CAMPD_Facility_Summary_Data/Emissions_Data/EPA_CAMPD_Annual_Emissions_WV_1995_2024.csv
files <- dir_ls(
  here("Data", "EPA_CAMPD_Facility_Summary_Data", "Emissions_Data"),
  regexp = "^.*/EPA_CAMPD_Annual_Emissions_[A-Z]{2}_[0-9]{4}_[0-9]{4}\\.csv$"
)

stopifnot(length(files) > 0)

read_one_chr <- function(f) {
  st <- str_match(basename(f), "EPA_CAMPD_Annual_Emissions_([A-Z]{2})_")[, 2]
  read_csv(
    f,
    col_types = cols(.default = col_character()),
    progress  = FALSE
  ) %>%
    mutate(stateCode = dplyr::coalesce(state_code, st),
           .file     = basename(f))  # keep provenance if useful
}

combined_chr <- purrr::map_dfr(files, read_one_chr)

# Let readr guess types once, consistently
combined_annual_emissions <- type_convert(combined_chr, guess_integer = TRUE)
## 
## ── Column specification ────────────────────────────────────────────────────────
## cols(
##   .default = col_character(),
##   facility_id = col_integer(),
##   year = col_integer(),
##   count_op_time = col_integer(),
##   sum_op_time = col_double(),
##   gross_load = col_double(),
##   steam_load = col_double(),
##   so2mass = col_double(),
##   so2rate = col_double(),
##   co2mass = col_double(),
##   co2rate = col_double(),
##   nox_mass = col_double(),
##   nox_rate = col_double(),
##   heat_input = col_double()
## )
## ℹ Use `spec()` for the full column specifications.

Combined EPA CAMPD Facility/Emissions Data

We connected the emissions to the facility details using the shared identifiers (state code, facility and unit IDs, stacks, and year). After merging, we verified that coordinates and county codes were still present and valid.

facility_info <- read_csv(here("Data", "EPA_CAMPD_Facility_Summary_Data", "Facilities_Data", "EPA_CAMPD_facilities_data.csv")) %>% 
  dplyr::filter(between(year, 1995, 2024))

fac_attrs <- facility_info %>%
  dplyr::select(
    county_code, facility_name, facility_id, unit_id, associated_stacks,
    year, program_code, operating_status, commercial_operation_date,
    source_category, latitude, longitude
  ) %>%
  tidyr::extract(
    county_code,
    into = c("state_code", "county_fips"),
    regex = "^([A-Za-z]{2})(\\d{3})$",
    remove = TRUE
  )  %>%
  mutate(
    state_code  = as.character(state_code), 
    facility_id = as.integer(facility_id),
    unit_id     = as.character(unit_id),
    year        = as.integer(year)
  )

combined_facility_emissions <- combined_annual_emissions %>%
  dplyr::select(-secondary_fuel_info, -so2control_info, -nox_control_info, -pm_control_info, -hg_control_info) %>%
  left_join(
    fac_attrs,
    by = c("state_code", "facility_id", "facility_name",
           "associated_stacks", "unit_id", "year")
  )

write_csv(
  combined_facility_emissions,
  here("Output", "Sheets", "CAMPD_facility_summaries_1995_2020.csv")
)

Visualizing EPA CAMPD Emissions Data

states_of_interest <- c("IL","IN","KY","OH","PA","TN","WV")

# grab state boundaries for context
states_filtered <- tigris::states(cb = TRUE, year = 2020, class = "sf") %>%
  filter(STUSPS %in% states_of_interest) %>%
  st_transform(4326)

# helper for parsed labels
chem_expr <- function(pol) {
  switch(pol,
         "SO2" = quote(SO[2]),
         "NOX" = quote(NO[x]),
         "CO2" = quote(CO[2]),
         as.name(pol))
}

# choose the pollutants (columns in your data)
pollutants <- c(SO2 = "so2mass", NOX = "nox_mass", CO2 = "co2mass")

# pick “milestone” years, adjust if you want more/less
years_keep <- c(1995, 2000, 2005, 2010, 2015, 2020, 2024)

# make a long table: state-year-facility-unit-pollutant, mass
camdp_long <- combined_facility_emissions %>%
  filter(state_code %in% states_of_interest,
         year %in% years_keep) %>%
  dplyr::select(
    state_code, year, facility_id, facility_name,
    unit_id, latitude, longitude,
    so2mass, nox_mass, co2mass
  ) %>%
  pivot_longer(cols = c(so2mass, nox_mass, co2mass),
               names_to = "pollutant_var", values_to = "mass") %>%
  mutate(
    pollutant = recode(pollutant_var,
                       so2mass = "SO2",
                       nox_mass = "NOX",
                       co2mass = "CO2")
  )

# sum to facility (not unit) per year/pollutant
facility_year_pol <- camdp_long %>%
  group_by(state_code, year, pollutant, facility_id,
           facility_name, latitude, longitude) %>%
  summarise(total_emissions = sum(mass, na.rm = TRUE), .groups = "drop") %>%
  filter(is.finite(latitude), is.finite(longitude)) %>%
  st_as_sf(coords = c("longitude", "latitude"), crs = 4326)

# a generic bubble plot (SO2 / NOX / CO2)
bubble_plot_camdp <- function(df, pol,
                              color      = "royalblue1",
                              years      = years_keep,
                              cap_pct    = 0.99,
                              max_size   = 1.2,
                              alpha      = 0.25,
                              xlim       = c(-92.5, -72.5),
                              ylim       = c(34, 44)) {

  lab_short <- scales::label_number(scale_cut = scales::cut_short_scale())

  df_pol <- df %>%
    filter(pollutant == pol,
           total_emissions > 0,
           year %in% years) %>%
    mutate(year = factor(year, levels = years))

  if (!nrow(df_pol))
    return(ggplot() + theme_void() + ggtitle(paste(pol, "- no data")))

  upper <- as.numeric(quantile(df_pol$total_emissions, cap_pct, na.rm = TRUE))
  lower <- min(df_pol$total_emissions[df_pol$total_emissions > 0],
               na.rm = TRUE)

  df_pol <- df_pol %>%
    mutate(total_emissions_plot = pmin(pmax(total_emissions, lower), upper))

  # 4–5 breaks works well
  brks <- scales::breaks_log(n = 5)(c(lower, upper))

  ggplot() +
    geom_sf(data = states_filtered, fill = NA, color = "grey20",
            linewidth = 0.2) +
    geom_sf(data = df_pol,
            aes(size = total_emissions_plot),
            color = color, alpha = alpha, show.legend = TRUE) +
    coord_sf(xlim = xlim, ylim = ylim, expand = FALSE, datum = NA) +
    scale_size_continuous(
      range  = c(0, max_size),
      limits = c(lower, upper),
      trans  = "log10",
      breaks = brks,
      labels = lab_short,
      name   = "Emissions (tons, capped)"
    ) +
    facet_wrap(~ year, drop = FALSE) +
    labs(
      title = bquote(.(chem_expr(pol)) ~ "Emissions by Facility (EPA CAMD)"),
      subtitle = "Distribution & Intensity (tons)"
    ) +
    theme_minimal(base_size = 11) +
    theme(
      legend.position = "bottom",
      plot.title      = element_text(hjust = 0.5),
      plot.subtitle   = element_text(hjust = 0.5),
      panel.background  = element_rect(fill = NA, colour = NA),
      plot.background   = element_rect(fill = NA, colour = NA),
      legend.background = element_rect(fill = NA, colour = NA)
    )
}

p_so2 <- bubble_plot_camdp(facility_year_pol, "SO2", color = "royalblue1")
p_nox <- bubble_plot_camdp(facility_year_pol, "NOX", color = "orange")
p_co2 <- bubble_plot_camdp(facility_year_pol, "CO2", color = "orangered")

print(p_so2)

print(p_nox)

print(p_co2)

ggsave(here::here("Output", "Figures", "EPA_CAMD_facility_SO2_distribution_intensity.png"),
       p_so2, width = 9, height = 6.5, dpi = 600)
ggsave(here::here("Output", "Figures", "EPA_CAMD_facility_NOX_distribution_intensity.png"),
       p_nox, width = 9, height = 6.5, dpi = 600)
ggsave(here::here("Output", "Figures", "EPA_CAMD_facility_CO2_distribution_intensity.png"),
       p_co2, width = 9, height = 6.5, dpi = 600)
# Get county polygons 
counties_sf <- tigris::counties(state = states_of_interest,
       cb = TRUE, year = 2020, class = "sf")  %>% 
  st_transform(4326) %>% 
  janitor::clean_names()

state_lu <- tigris::fips_codes %>%
  dplyr::select(state = state, state_fips2 = state_code) %>%
  dplyr::distinct()

county_totals <- combined_facility_emissions %>%
  # keep only states you’ll map (optional)
  dplyr::filter(state_code %in% states_of_interest) %>%
  # make sure numeric
  dplyr::mutate(
    so2mass  = as.numeric(so2mass),
    nox_mass = as.numeric(nox_mass),
    co2mass  = as.numeric(co2mass)
  ) %>%
  tidyr::pivot_longer(
    c(so2mass, nox_mass, co2mass),
    names_to = "pollutant_var", values_to = "mass"
  ) %>%
  dplyr::mutate(
    pollutant = dplyr::recode(pollutant_var,
      so2mass = "SO2", nox_mass = "NOX", co2mass = "CO2"
    )
  ) %>%
  # add 2-digit state FIPS so we can form a 5-digit GEOID
  dplyr::left_join(state_lu, by = c("state_code" = "state")) %>%
  dplyr::mutate(
    geoid = sprintf("%02s%03s", state_fips2, county_fips)
  ) %>%
  dplyr::group_by(geoid, pollutant) %>%
  dplyr::summarise(total_emissions = sum(mass, na.rm = TRUE), .groups = "drop")

pollutant_labs <- c(SO2 = "SO[2]", NOX = "NO[x]", CO2 = "CO[2]")

make_county_map <- function(counties_sf, county_totals, pol, option = "C") {

  df_pol <- counties_sf %>%
    left_join(
      county_totals %>% filter(toupper(pollutant) == toupper(pol)),
      by = "geoid"
    ) %>%
    filter(is.finite(total_emissions), total_emissions > 0)   # <- drop zeros/negatives

  if (!nrow(df_pol)) return(ggplot() + theme_void() + 
                              ggtitle(paste(pol, "- no data")))
  
  pos_vals <- df_pol$total_emissions
  rng <- range(pos_vals, na.rm = TRUE)
  log10_breaks <- function(rng, n = 5) {
    lo <- floor(log10(rng[1])); 
    hi <- ceiling(log10(rng[2])); if (lo == hi) hi <- lo + 1
    10^(unique(round(seq(lo, hi, length.out = n))))
  }
  brks <- log10_breaks(rng, n = 5)

  ggplot(df_pol) +
    geom_sf(aes(fill = total_emissions), color = NA) +
    scale_fill_viridis_c(
      option = option,
      trans  = "log10",
      breaks = brks,
      limits = range(brks),
      na.value = "grey95",
      name = "Emissions (Tons, log10)"
    ) +
    labs(
      title = bquote("Facility-Level Emissions by County, Total 1995–2024: " *
                       .(parse(text = pollutant_labs[[toupper(pol)]]))),
      subtitle = "EPA CAMD — Summed Facility Totals; IL, IN, KY, OH, PA, TN, WV",
      caption  = "Projection: WGS84 / EPSG:4326"
    ) +
    guides(fill = guide_colorbar(
      title.position = "top",
      barwidth = unit(15, "cm"),
      barheight = unit(0.4, "cm"),
      ticks = FALSE
    )) +
    theme_minimal(base_size = 11) +
    theme(
      panel.grid       = element_blank(),
      axis.title       = element_blank(),
      legend.position  = "bottom",
      legend.title     = element_text(margin = margin(b = 4)),
      panel.background  = element_rect(fill = NA, colour = NA),
      plot.background   = element_rect(fill = NA, colour = NA),
      legend.background = element_rect(fill = NA, colour = NA)
    )
}


# Build + save each map

p_so2 <- make_county_map(counties_sf, county_totals, "SO2")
p_nox <- make_county_map(counties_sf, county_totals, "NOX")
p_co2 <- make_county_map(counties_sf, county_totals, "CO2")

print(p_so2)

print(p_nox)

print(p_co2)

states_of_interest <- c("IL","IN","KY","OH","PA","TN","WV")

# Aggregate to state × year for the three pollutants
state_year_long <- combined_facility_emissions %>%
  filter(state_code %in% states_of_interest) %>%
  transmute(
    state_code, year,
    so2 = as.numeric(so2mass),
    nox = as.numeric(nox_mass),
    co2 = as.numeric(co2mass)
  ) %>%
  group_by(state_code, year) %>%
  summarise(
    so2 = sum(so2, na.rm = TRUE),
    nox = sum(nox, na.rm = TRUE),
    co2 = sum(co2, na.rm = TRUE),
    .groups = "drop"
  ) %>%
  tidyr::pivot_longer(
    cols = c(so2, nox, co2),
    names_to = "pollutant",
    values_to = "total_tons"
  ) %>%
  mutate(
    pollutant = toupper(pollutant),
    pollutant_lab = dplyr::recode(pollutant,
      "SO2" = "SO[2]",
      "NOX" = "NO[x]",
      "CO2" = "CO[2]"
    )
  )

subtitle_expr <- bquote("Summed across facilities/units — "
                        * SO[2] * ", " * NO[x] * ", and " * CO[2]
                        * " (1995–2024)")

# Palette (one distinct color per state)
pal_states <- scales::hue_pal()(length(states_of_interest))
names(pal_states) <- states_of_interest

# Faceted plot (one panel per pollutant)
p_all <- ggplot(state_year_long,
                aes(x = year, y = total_tons,
                    color = state_code, group = state_code)) +
  geom_line(linewidth = 0.5, alpha = 0.7) +
  geom_point(size = 1) +
  scale_color_manual(values = pal_states, name = "State") +
  scale_y_continuous(labels = scales::comma) +
  facet_wrap(~ pollutant_lab, scales = "free_y", 
             labeller = label_parsed, ncol =1) +
  labs(
    title    = "State-Level Power Sector Emissions Over Time (EPA CAMD)",
    subtitle = subtitle_expr,
    x = NULL,
    y = "Emissions (tons)"
  ) +
  theme_minimal(base_size = 11) +
  theme(
    legend.position   = "bottom",
    panel.grid.minor  = element_blank(),
    strip.text        = element_text(face = "bold"),
    panel.background  = element_rect(fill = NA, colour = NA),
    plot.background   = element_rect(fill = NA, colour = NA),
    legend.background = element_rect(fill = NA, colour = NA)
  )

print(p_all)

# write out the plot
ggsave(here::here("Output", "Figures", "EPA_CAMPD_state_emissions_trends.png"),
       p_all, width = 9, height = 6.5, dpi = 600)

Citations

United States Environmental Protection Agency (EPA). “Power Sector Emissions Data.” Washington, DC: Office of Atmospheric Programs, Clean Air Markets Division. Available from EPA’s Air Markets Program Data web site: https://ampd.epa.gov.

United States Environmental Protection Agency (EPA). “Clean Air Markets Program Data.” Washington, DC: Office of Atmospheric Protection, Clean Air Markets Division. Available from EPA’s Air Markets Program Data web site: https://campd.epa.gov/.